Publishing the Trove Newspaper Corpus

نویسنده

  • Steve Cassidy
چکیده

The Trove Newspaper Corpus is derived from the National Library of Australia’s digital archive of newspaper text. The corpus is a snapshot of the NLA collection taken in 2015 to be made available for language research as part of the Alveo Virtual Laboratory and contains 143 million articles dating from 1806 to 2007. This paper describes the work we have done to make this large corpus available as a research collection, facilitating access to individual documents and enabling large scale processing of the newspaper text in a cloud-based environment.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Manipulative Propaganda Techniques

Influencing the public attitude towards certain topics had become one of the strongest weapons in today’s information warfare. The ability to recognize a presence of propaganda in newspaper texts is thus a treasured phenomenon, which is not directly transferable to algorithmic analysis. In the current paper, we present the first steps of the project aiming at detection and recognition of select...

متن کامل

English and Persian Sport Newspaper Headlines: A comparative study of linguistic means

Abstract Using rhetorical figures in specialized languages like the language of newspaper headlines is common. The present study attempted to conduct a contrastive analysis of the English and Persian sport newspaper headlines related to the 2014 FIFA World Cup. Toward this end, a corpus consisting of 400 English and 400 Persian headlines published during 12th of June to 13th of July, 2014 was c...

متن کامل

English and Persian Sport Newspaper Headlines: A comparative study of linguistic means

Abstract Using rhetorical figures in specialized languages like the language of newspaper headlines is common. The present study attempted to conduct a contrastive analysis of the English and Persian sport newspaper headlines related to the 2014 FIFA World Cup. Toward this end, a corpus consisting of 400 English and 400 Persian headlines published during 12th of June to 13th of July, 2014 was c...

متن کامل

Reflection of Knowledge and Information Science’s News in the Press: A Case Study of Iran Newspaper

Background and Aim: The present study aims to explore the coverage and reflection of Knowledge and Information Science news in the Iranian press. Iran Newspaper which is one of the main public newspapers in the country has been selected as the case for this study. Method: This study used content analysis as its research methodology and adopted an inductive approach in data analysis. All the pag...

متن کامل

New production models for newspaper organizations

Information delivery is undergoing profound changes. The established media such as radio, television, and newspapers are faced with a variety of new digital content formats. New dimensions of publishing can be exploited in the case of newspaper production. This paper investigates the changes in the production models of newspaper organizations caused by the introduction of information technology...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016